Large-scale Controlled Vocabulary Indexing for Named Entities

نویسنده

  • Mark Wasson
چکیده

A large-scale controlled vocabulary indexing system is described. The system currently covers almost 70,000 named entity topics, and applies to documents from thousands of news publications. Topic definitions are built through substantially automated knowledge engineering.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Indexing and Comparison of Multi-Dimensional Entities in a Recommender System based on Ontological Approach

The paper describes an application of indexing—the technology currently widely used for processing and comparing textual information—to multi-dimensional entities of knowledge domains. We propose a model for building a frame-based ontology, which contains a domain conceptual model as well as a controlled vocabulary of “base terms” used for indexing. Further, the ontology constitutes the structu...

متن کامل

OOV Sensitive Named-Entity Recognition in Speech

Named Entity Recognition (NER), an information extraction task, is typically applied to spoken documents by cascading a large vocabulary continuous speech recognizer (LVCSR) and a named entity tagger. Recognizing named entities in automatically decoded speech is difficult since LVCSR errors can confuse the tagger. This is especially true of out-of-vocabulary (OOV) words, which are often named e...

متن کامل

Bilingual Indexing for Information Retrieval with AUTINDEX

AUTINDEX is a bilingual automatic indexing system for the two languages German and English. It is being developed within the EU-funded BINDEX project. The aim of the system is to automatically index large quantities of abstracts of scientific and technical papers from several areas of engineering. Automatic indexing takes place using a controlled vocabulary provided in monolingual and bilingual...

متن کامل

Automatic algorithm selection for MeSH Heading indexing based on meta-learning

We present a methodology that automatically selects indexing algorithms for each heading in MeSH, NLM’s vocabulary for indexing MEDLINE. While manually comparing indexing methods is manageable with a limited number of MeSH headings, a large number of them makes automation of this selection desirable. Results show that this process can be automated based on previously indexed MEDLINE records. We...

متن کامل

Bibliographic database access using free-text and controlled vocabulary: an evaluation

This paper evaluates and compares the retrieval effectiveness of various search models, based on either automatic text-word indexing or on manually assigned controlled descriptors. Retrieval is from a relatively large collection of bibliographic material written in French. Moreover, for this French collection we evaluate improvements that result from combining automatic and manual indexing. Fir...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000